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ABSTRACT 

A  new  interpretation  is  given,  whiclt  provides  another  way  of  understanding 
the  structure  of  the  species  problem  and  sheds  light  on  the  properties  of  a  general  cover¬ 
age  problem.  As  an  illustrative  example,  the  popular  Turing-Good-Robbins  estimator  is 
“shown”  to  be  a  natural  choice  from  this  interpretation  in  the  species  problem.  We  set  up 
a  general  framework  of  various  coverage  problems  in  this  paper.  The  new  interpretation  is 
applied  to  this  general  situation  which  leads  to  many  interesting  applications  in  addition 
to  the  species  problem.  The  coverage  problems  considered  in  this  paper  include  the  species 
problem,  the  problem  of  estimating  the  volume  of  a  convex  set,  and  the  missile-coverage 
problem.  It  is  pointed  out  that  the  genral  estimators  derived  from  this  new  interpreta¬ 
tion  usually  estimate  the  probabilistic  phenomenon  involving  only  “n  —  1”  observations 
whidi  may  not  be  appropriate.  A  general  modified  procedure  is  thus  suggested  to  improve 
the  current  estimators.  To  justify  the  interpretation  theoretically,  w-e  present  some  limit 
theorems  in  terms  of  species  problem,  even  though  the  results  are  expected  to  hold  more 
generally. 


Summary.  A  new  interpretation  is  given,  which  provides  another  way  of  understanding 
the  structure  of  the  species  problem  and  sheds  light  on  the  properties  of  a  general  cover¬ 
age  problem.  As  an  illustrative  example,  the  popular  Turing-Good-Robbins  estimator  is 
'^shown"  to  be  a  natural  choice  from  this  interpretation  in  the  species  problem.  We  set  up 
a  general  framework  of  various  coverage  problems  in  this  paper.  The  new  interpretation  is 
applied  to  this  general  situation  which  leads  to  many  interesting  applications  in  addition 
to  the  species  problem.  The  coverage  problems  considered  in  this  paper  include  the  species 
problem,  the  problem  of  estimating  the  volume  of  a  convex  set,  and  the  missilc-covernge 
problem.  It  is  pointed  out  that  the  genral  estimators  derived  from  this  new  interpreta¬ 
tion  usually  estimate  the  probabilistic  phenomenon  involving  only'^‘‘n  —  I’^bservations 
which  may  not  be  appropriate.  A  general  modified  procedure  is  thus  suggested  to  impro\'c 
the  current  estimators.  To  justify  the  interpretation  theoretically,  we  present  some  limit 
theorems  in  terms  of  species  problem,  even  though  the  results  are  expected  to  hold  more 
generally.  '  ^ 

1.  Introduction 

The  problem  of  estimating  the  total  probability  of  unseen  species  goes  back  to  A.M. 
Turing  according  to  Good  (1953).  To  describe  the  problem  comprehensively,  we  use  the 
notation  of  Robbins  (1956,  19GS).  Let  {ci,C2,e3  .. .}  be  the  possible  distinct  species  with 
probabilities  being  selected  in  a  single  experiment.  In  n  independent  trials 

OO 

suppose  that  species  appear  r  times,  r=l,2,. . .,  and  ^  vnr—n.  We  also  use  mq  to  denote 

r=] 

the  number  of  species  whicli  are  not  present  in  the  sample.  It  is  clear  that  ni,n2<  •  •  •  i  sxe 
observable,  but  no  is  not.  In  fact  no  is  infinite  if  there  are  infinitely  many  species.  Let 
{A'i  =  j}  if  and  only  if  the  i^**  trial  results  in  outcome  Cj. 

For  r  >  0,  let  (pjir\  n)=l  if  the  number  of  {A",  =  is  rand  0  otherwise.  In  particular, 
the  sum  of  the  probabilities  pj  for  those  species  which  are  not  observed  is 

eo 

(1.1)  Co  =  J^p>V’>(0;n)  . 

More  generally,  the  sum  of  the  probabilities  of  all  species  that  are  each  represented  r(r  >  0) 
times  in  the  sample  is 


Cr  =  ^P><P>(»';»0 
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To  estimate  Cr,  Turing  (see  Good  (1953))  suggested  the  formulas: 


(1.3) 


(r  +  l)nrti 
n 


for  r  >  0  . 


Using  a  uniform  prior,  Good  (1953)  gave  a  derivation  of  these  estimators  from  a 
Bayesian  point  of  view.  Since  then  several  other  interpretations  of  these  estimators  have 
appeared  in  the  literature.  These  include  Good  (1953),  Robbins  (1956, 1968),  and  Diaconis 
and  Stein  (1983)  among  others.  Various  justifications  of  this  type  of  estimator  have  been 
given.  It  should  be  noted  that  Robbins  (1968)  constructed  an  “unbiased”  estimator  for 
Co  which  is  very  similar  to  (1.3).  However,  Robbins’  estimator  is  justified  through  the 
device  of  adding  an  additional  trial  to  the  original  n  observations.  Here  an  estimator  is 
called  “unbiased”  for  estimating  a  random  variable  if  E(estimate)  =  E(random  variable). 
The  problem  continues  to  attract  the  attention  of  many  researchers.  To  name  a  few:  Starr 
(1979),  Clayton  and  Frees  (1987),  Estey  (1986),  Bickel  and  Yahav  (1985),  and  Cohen  and 
Sadcrowitz  (1988).  Most  works  concern  the  properties  of  the  estimators  of  type  (1.3);  either 
from  asymptotic  or  decision  theory  points  of  view.  As  an  important  application,  the  species 
problem  is  currently  of  great  interest  to  researchers  in  automated  speech  identification 
(Bald  et  al  (1983),  Jelinek  (1976),  and  Katz  (1987)  among  others). 

My  object  is  to  introduce  another  interpretation  of  these  estimators  w’hich  leads  to 
interesting  applications  other  than  the  species  problem.  Later  in  this  section  wc  shall 
outline  my  approach  using  the  species  problem  as  an  illustrative  example.  As  a  consequence 
it  will  become  quite  clear  why  the  estimators  of  type  (1.3)  are  “natural  choices”  in  the 
species  problem. 

In  Section  2  a  framework  for  a  general  coverage  problem  is  introduced.  Some  general 
estimates  and  their  properties  are  derived  using  my  interpretation.  It  is  pointed  out  that 
the  general  estimates  (including  (1.3)  in  the  species  problem)  derived  from  the  interpre¬ 
tation  are  usually  “biased”  slightly  upward.  A  general  modified  procedure  is  suggested  to 
reduce  the  biases.  The  success  of  this  procedure  depends  heavily  upon  the  nature  of  the 
underlying  problems.  Although  the  biases  are  relatively  small  for  many  applications,  their 
reduction  seems  to  be  interesting  from  a  theoretical  point  of  view. 

Section  3  consists  of  three  subsections,  3. 1-3.3,  which  display  three  special  examples 
as  direct  applications  of  the  general  framework  established  in  Section  2.  It  seems  to  this 
author  that  the  range  of  potentially  useful  applications  is  broader  thzin  presented  here.  The 
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first  example  is  a  further  discussion  of  the  species  problem.  The  second  example  concerns 
the  problem  of  estimating  the  volume  of  an  arbitrary  convex  figure  in  Euclidcam  space. 
The  connection  between  the  interpretation  and  the  problem  of  estimating  the  volume  of 
a  convex  polyhedron  w'as  pointed  out  to  me  by  Diaconis  in  a  conversation.  Some  new 
results  related  to  this  problem  on  the  plane  are  given,  and  the  structure  of  the  problem  on 
higher  dimensions  is  sketched  heuristically.  The  last  example  deals  with  a  missile-coverage 
problem: 

“n  missiles  are  delivered  and  landing  at  a  certain  target  area  whicli  is  usually  larger 
than  the  ‘effective  area’  caused  by  the  explosion  of  a  single  missile.  The  typical 
questions  we  are  interested  in  are:  (1)  if  the  (n-t-l)***  missile  is  fired,  what  is  the  chance 
that  this  additional  missile  would  involve  area  which  was  not  covered  previously?  (2) 
How  large  is  the  newly  covered  area?  (3)  How  many  more  missiles  are  needed  to  cover 
90%  of  the  target  area?” 

We  shaJl  provide  most  of  the  answers  to  these  questions  in  Section  3.3. 

Section  4  is  rather  technical,  where  we  shall  give  some  limit  theorems  in  terms  of  the 
species  problem.  In  order  to  present  the  idea  simply  arid  clearly,  we  have  chosen  to  treat 
special  cases,  even  though  the  results  are  expected  to  hold  more  generally. 

The  main  purpose  of  this  paper  is  to  set  up  a  framework  including  various  coverage 
problems  so  that  the  relevant  parameters  can  be  estimated  by  estimators  whicli  are  obvious 
choices  through  the  interpretations.  Now  we  shall  use  the  species  problem  as  an  examjde 
to  give  the  flavor  of  the  interpretation. 

Suppose  we  are  interested  in  the  probability  Cr  in  the  species  problem.  Let  A'„+j 
denote  the  additional  obser\’ation.  The  random  probability  Cr  is  identical  to  the  following 
conditional  probability 


(1.4)  €  5„(r))A^i,X2,...,A’’n} 

where  5„(r)  =  {  j-  (pj{r\ n)  =  1}.  Based  on  n  observations,  it  is  natural  to  estimate 


(1.4') 


P{Xj  €  5„_i,>(r)|A„.>)  for  all  1  <  j  <  n 
by  /s..,,(r)(A,) , 
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where 


^n,j  =  »  •S'n-i.j(r)  =  -  1)  =  1) 

•Vj 

and  tpijir;  n  —  1)  =  1  if  and  only  if  i  appeair  exactly  r  times  in  A„j. 

With  continuity  property,  it  is  expected  that  (1.4)  and  (1.4')  are  close  to  each  other. 
(A  general  discussion  of  this  “closeness”  is  given  in  Section  2.)  It  is  then  natural  to  estimate 

(1.5)  -f^P{XjeSn-,j{r)\A„j]  by 

(I-^)  ~  . 

i=i 

Since 

(1.7)  /5^_,^(r)(Aj)  =  1  <=>  A;6Sn(r  +  l), 

the  estimate  (1.6)  can  thus  be  rewritten  as 

(1.5)  , 

'  n 

which  is  exactly  the  formula  suggested  by  Turing  and  studied  by  Good  (1963,  195C). 

By  taking  expectation,  we  obtain 

(1.9)  E(P{A'„+,  €  5'„(r)|A'i,A:2,...,A'„))  =  P{A'„+,  €  5„(r)} 

(1.10)  E(P{A',  €  S„-i»lAn.>})  =  P{A„  e  5„_i(r)},and 

(1.11)  E(i  =  E  6  5„.,(r)l  . 

J  =  1 

Therefore,  is  an  “unbiased”  estimate  of  Pj-Yn  €  5„_i(r)|A’’i,A2, . . . ,  A^n-j } 

in  the  sense  that  both  random  quantities  have  the  same  expectation.  This  is  contrasted 
with  the  Robbins’  arguments  (1967)  where  “unbiasedness”  was  proved  in  the  case  r=0 
through  direct  calculations.  Here,  the  “unbiasedness”  is  shown  more  generally  with  po 
calculation. 
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We  saw  that  the  estimator  is  the  average  of  naive  estimators  based  on  samples  of 
size  n  —  1,  and  it  estimates  P(Xn  €  5n-i(r)),  a  probabilistic  statement  based  on  “n  —  1” 
observations.  As  an  estimator  of  (1.4),  (1.8)  is  biased.  This  bias  is  slight  because  (1.4) 
changes  little  as  n  increases.  In  Section  3  we  shall  improve  this  estimator  to  reduce  the 
bias  which  could  be  substantial  in  other  problems  for  which  this  approach  applies. 

The  key  idea  of  this  approach  is  to  create  requred  information  by  temporarily 
deleting  one  observation  from  the  sample  one  at  a  time,  and  the  required  information 
is  obtained  bv  comparing  the  deleted  observation  with  the  remaining  n  —  1 
observations.  The  final  estimate  is  obtained  bv  taking  the  average 
over  these  n  steps,  and  it  is  no  surprise  that  the  final  estimator  really  estimates 
the  probabilistic  phenomenon  involving  n  —  1  observations.  Even  though  the  idea  behind 
the  procedure  is  simple,  it  can  be  generalized  to  a  fairly  general  model  which  is  the  subject 
of  the  next  section. 

2.  A  General  Coverage  Problem 

In  this  section  we  shall  discuss  a  general  coverage  problem  in  which  a  random  sample 
A*i,...,A'n  of  size  n  is  observed  from  a  certain  probability  space  (Q,F, P).  Let  fl  denote 
a  collection  of  certain  subsets  of  a  fixed  set  A  in  3?*^,  A:  >  1,  whereas  F  and  P  are  an 
appropriate  6-field  and  a  probability  measure  defined  on  F. 

Typical  sample  outcomes  of  A*"! , . . . ,  A’n  are  n  subsets  of  A.  Consider  all  possible  finite 
intersections  among  {A^}"_i  and  A,  it  is  easy  to  check  that  these  intersections  result  in 
a  finite  partition  =  of  A  with  2"  disjoint  subsets  of  A.  Let  5  be  a  well-defined 

function  from  fl  to  P* .  Some  of  the  problems  we  wish  to  solve  are  the  following: 

Given  a  specified  subset  5„=5(A'] ,  A'2,  • . . ,  A'„ ;  P)  of  A,  possibly  depending  on  both  X„= 
(A'^i ,  A'a , . . . ,  A'n )  and  P,  estimate 

(i)  the  probability  that  ff(A'„+i)  €  5„  given  5„.  Furthermore,  if  all  elements 

in  n  are  Lebesque-measureable,  we  are  interested  in  estimating 

(ii)  the  expected  volume  of  Sn  H  A'n-n  given  5„,  and 

(iii)  the  expected  volume  of  S„+i  n  Sn  given  S„  if  additional  sample  A'^„+i  is 

made. 

We  assume  throughout  this  section  that  S„  is  defined  for  every  n  >  1.  The  key  idea  can 

best  be  described  as  a  one-step  “backward”  procedure  as  follows.  Let  A")  be  randomly 

remov'ed  from  the  sample  {A'l,  A'2, • . .  ,.Yn),  and  let  A„,j  denote  the  removed  sample, 

i.e,  .4„  j=  U  (A'i).  Let  Sn-i,>=5(A„,,;P)  be  the  specified  subset  of  A  based  on  the  sample 

i9tj 


C 


An,}  of  size  n  —  1.  We  further  define  an  indicator  function 


if  j(A',)€S.-i., 

otJjerwisc. 


As  pointed  out  in  the  previous  section,  our  procedure  will  lead  to  some  estimators 
which  estimate  the  probabilistic  statements  involving  “n  —  1”  observations.  For  this  reason, 
we  shall  call  them  “(n  —  l)-estimators”  hereafter. 

Instead  of  estimating  the  probability  P(ff(A''„+i)  €  5„|5„)  in  (i),  the  “(n  —  1)- 
estimator”  estimates  P(g{Xn)  €  Sn-i  |Sn-i).  The  construction  can  be  described  as  follows: 
(i')  In  order  to  estimate  P(<;(A'n)  €  note  that  the  probability  that  g(Xj)  € 

can  be  estimated  by  enipirically.  In  fact,  this  estimator  is  “unbizised” 

in  the  sense  that 


E(/s„..,,[5(A',)])  =  P{i7(A'„)  €  5„_,)  =  E(P(5(A'„)  6  . 

Since  X}  is  randomly  removed  from  the  sample,  a  final  estimator  ((n  —  l)-estimator) 
is  thus 


which  is  also  “unbiased.” 

Likewise,  instead  of  estimating  (ii)  wc  estimate 


(ii')  E(vol  [S„_i  nA'„]lS„_i)  . 

Consider  the  estimator  vol  H  A'j]  V  1  <  j  <  n.  It  is  cleeir  that 

E  (voI(5„_i.,  nA'j)) 

=  E(vol(S„_,nA„]) 

=  E(E(vol(S„_,  nA„]I5„_i) 

V  1  <  ;■  <  n  , 

the  (n  —  l)-estimator  is  thus 

if]  »oI|S„-,,,n.\V) 
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and  is  also  “unbiased.” 
For  estimating 


(«i')  E(volIS„nS„_,)/5„-i), 

we  consider  the  estimator  vol  (S„-i  j  n  Sn)  V  1  <  ji  <  n.  Again,  it  is  easy  to  see 

E  (voi[s„-,.j  n  s„])  =  E  (voi(s„  n  s„_,]) 

=  E(E(vol[5„n5„.,)|5„_,)) 

V  1  <  j  <  n  . 

The  fin2d  (n  —  l)-estimator  is 


1 

-  vol(5„-],>  n  5„)  . 

Remark.  The  assumptions  made  above  about  the  sampling  plan  can  be  further  relaxed. 
In  fact,  one  can  check  that  the  only  assumption  we  need  (to  guarantee  the  conclusion)  is 
T(A'i, . . . ,  A'’nlP)=I'(A',n,  Air2»  •  •  •  i  A',rnl-P)  foT  any  permutation  ?r  on  {1,2, . . .  ,n}  for  every 
n.  In  particular,  if  (A'’i,...,A*n)  are  exchangeable  random  elements,  all  the  conclusions 
discussed  above  still  hold. 

If  Sn  =  S(A*i,  A'a,. . .  ,A’n;P)  =  S(P)  docs  not  depend  on  A'n  =  (A'l ,  A'2, . . . ,  A'n),  it 
is  eaisy  to  check  that  our  interpretation  will  lead  to  an  estimator  which  is  the  well  known 
estimator  obtained  by  the  empirical  measure. 

As  estimators  of  (i),  (ii),  and  (iii),  these  (n  —  l)-estimators  are  all  “biased.”  In  many 
applications  the  biases  are  slight  because  (i),  (ii),  and  (iii)  changed  little  as  n  increases. 
We  shall  refer  to  this  property  as  continuity  property.  However,  in  our  general  framework, 
this  property  is  not  automatically  guaranteed.  As  a  result,  just  how  well  these  (n  —  1)- 
estimators  estimate  (i),  (ii),  and  (iii)  depends  upon  the  forms  of  5„  and  Sn-i  •  The  following 
proposition  tells  us  that  the  sucess  of  using  (n  —  l)-estimators  to  estimate  (i),  (ii),  and  (iii) 
depends  on  the  “closeness”  of  5n-j  to  5„. 

Proposition  2.1.  Assuming  that  X  is  randomly  chosen  from  (fi,  F,  P)  and  is  independent 

sfS„-,(A'i,A2 . Xn-i\P),Sn{XuX2 . A„;  P)  find  Sn+i( A, ,  A2 . A„,A„+r.P). 

Let  q  be  a  measurable  function  from  (fl,F,  P)  to  3?*  such  that  q(  xv)  €  u>  for  all  w  6  fl. 
We  further  assume  E  vo^A")^  <  00. 
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If 


P{[A'^  n  {S„  U  Sn-\\Sn  n  S„_i}]  7^  (^}  =  ^„  >  0  for  aJl  n  >  1  , 


Ihep 

(1)  |P{ff(A'„+,)  €  Sn)  -  P{s(A„)  e  S„-,}1  <  6„  and 

(2)  E  (vol[S„  n  A'„+,))  -  E  (vollS„_,  n  A-n])  =  0(6„" )  . 
If  we  further  assume  vol  (A)  <  oo,  then  (2)  becomes 

(2')  E(vol[S„nA'„+,l)-  E  (voll5,._,  nA„))  =  0(<5„)  • 

Proof  of  til 

It  suffices  to  show 


|P{}(.V)  £  S„)  -  P{j(A-)  €  s„-,)l  <  . 

Since 


P{(?(A’)€5„}-P{i7(A')6  5„_,} 

=  P{g{X)  €5„\5„_,}  -  P{9(A-)  € 
it  followr  from  assumptions  that  both  terms  above  are  smaller  than 

P{(A  n  {[S„  u  S„_,)\[S„  n  S„_,l}]  /  d)  =  <5„  , 

and  the  proof  of  (1)  follows  immediately. 

Proof  of  f21  One  can  write 

IE  (vol[5„  n  A'„+i))  -  E  (vol[S„_i  n  A'„])| 

=  lE(volIS„nA)-  vol[S„_,nA])l 

=  |E  (voi[5„  nx)\(5„_,  nx)]  -  E  (vol[(5„_,  n  A')\(5„  n  A')])| 
Both  of  the  above  terms  are  clearly  bounded  by 

E  (voi(A'  n  |S„  u  S„-,]\[S„  n  s„_i])) 

<  E  (voKA")- /jS^uSn-.lMSnnSn-ilCA')] 

<  E  (voKA-)')*  =0(4). 
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This  compI'?tes  the  proof  of  (2).  If  vol  (A)  <  oo,  then  since  voJ  (AT)  <  vol(A)  w.p.l., 
it  follows  that 


E  lvol(<Y)/(s„us„_,l\ls„ns„_,l(-V)]  <  voI(A)  •  =  0(tf„)  , 

which  completes  the  proof  of  (2'). 

The  “Biases”  of  (n  —  iVestimates 

As  we  have  shown,  in  (i),  (ii),  (iii),  the  proposed  (n  —  l)-estimates  are  “unbiased” 
in  estimating  the  probabilistic  statements  (involved  only  n  —  1  observations),  whicli  arc 
different  from  those  based  on  n  observations.  In  other  words,  there  would  be  some  biases 
if  we  use  these  (n  —  l)-estimates. 

To  calculate  the  biases,  we  pretend  the  additional  observation,  A'^n+i  is  taken.  The 
(n)-estimates  obtianed  by  applying  (i),  (ii),  and  (iii)  to  this  n  +  1  observation  should  be 
“unbiased.”  Therefore,  the  biases  of  (n  —  l)-estimates  can  be  evaluated  by  comparing  these 
(n  —  l)-estimates  with  (n)-estimates.  For  example,  as  in  (i),  the  (n  —  l)-estimate  is 


and  the  (n)-estimate  is 

J=1 

The  “bias”  of  (n  —  l)-estimate  is  thus 


(2.1) 


,  n  ..  n-H 

j=i  y=i 


where 


n+l 

^n,j  —  ^(An+l,>»p)»  find  An  +  I,)  =  {Ai}  • 

The  bias  term  (2.1)  can  be  calculated  once  the  knowledge  of  “relationship”  between 
Sn-i  and  S„  is  provided  and  this  is  possible  only  if  the  nature  of  the  problems  is  specifically 
given.  In  this  case,  as  we  shall  see  in  the  next  section,  some  better  estimators  are  always 
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available.  Here,  “better”  means  smaller  “biases.”  The  key  idea  of  constructing  these  better 
estimators  is  to  estimate  A'n+i  by  the  current  sample  { .Yj , . . . ,  A'n  )  first.  The  final  estimate 
is  obtained  as  if  we  had  ”n  +  1”  observations.  The  idea  is  closely  related  to  the  idea  of  the 
EM  algorithm  (see  Dempster  et  al  (1977)). 

3.  Examples 

3.1  Species  Problems 

In  this  section  we  shall  continue  our  discussion  of  species  problems  introduced  in  Sec¬ 
tion  1.  The  problem  of  estimating  the  total  probability  of  unseen  species  can  be  put  in  the 
framework  of  general  coverage  problem  as  in  the  previous  section.  Let  E={ci  ,C2, . . .}  be 
the  possible  distinct  species  with  probabilities  pi.pa,  •  •  • ,  being  selected  in  a  single  experi¬ 
ment.  Let  A  denote  the  set  of  all  positive  integers.  Let  us  maike  a  natural  correspondence 
between  the  outcomes  space  E  and  set  A  by  “ci  «-»  t.”  The  correspondence  allows  us  to 
treat  Xj  as  random  variable  sucli  that  {A*j  =  t)  <=>  the  j***  trial  results  an  outcome  e,. 
It  follows  that  in  this  case  fI=A,F  =  2^  and  P{A'  =  i}=p,  for  i  G  A. 

Having  observed  A'l ,  A'2 , . . . ,  A'n ,  the  collection  of  unseen  species  can  be  expressed  as 

S„  =  S(A'2,A2,...,A'„;P)  =  {;•;>  ^  {A, . A„}}  C  A  . 

Let  g  denote  an  identity  map  from  fi  to  i.e.,  g{i)  =  i.  The  problem  of  estimating 
the  total  probability  of  unseen  species  is  thus  equivalent  to  estimating  the  probability  of 
y(A'’„+i)  G  5„  given  S„.  More  precisely, 

P{5(A'„+,)€S„1S„)  . 

According  to  the  previous  section,  the  (n  —  l)-estimate  as  in  (i')  is 


(311)  . 

j=t 

where 

fl 

Sn-J,j  =  (J{A,}  =  A„j  . 

Suppose  we  want  to  estimate  the  total  probability  of  all  species  that  appear  r(r  >1)  times 
in  the  sample.  By  a  similar  argument. 
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S„(r)  =S(.V„.V, . A-„;P,r) 

n 

=  =  rj  6  A} 


5„_,»  =  SM„j;P,r)={i;  ^  /x,,(t)  =  r,iG  A} 

A,*#  G An,> 


Since 


Xj  €  S„_i,_,(r)  <=>•  Xj  6  5„(r  +  1)  , 

it  follows  from  this  fact  that  the  (n  -  l)-estimate  in  this  case  (as  in  (i')  again)  takes  the 
form 

(3.1.2)  -2l^s„...,(r)(A>)=  - 

>=i 

whicli  is  formula  (1.8). 


ie  “Biases”  of  (v  — 


From  (2.1)  ajid  (3.1.1),  the  “bias”  of  (n  -  l)-estimatc 


in  estimating  P{A'  n+l  €  Sn|Sn)  is 


fni  >»i+M 

\  n  fi  +  1  /  ’ 


1 

if  A-„+, 

where  ^  =  \  0 

ifA'„+, 

-1 

\ 

if  AV, 

occurred  once  among  {A'l, . . . ,  A'n}. 


.,A'n} 


It  follows  trivially  that 


(3.1.3) 


2  1 

I  bias  of  (n  —  1)— cstimatcl  <  ^ 


The  knowledge  between  the  relationship  of  Sn-i  to  Sn  enables  us  to  construct  a  better 
estimate  of  which  the  bias  is  of  order  ( Aj)  contrast  with  the  order  of  (A)  prov'ided  by  the 
previous  (n  -  l)-estimate.  The  construction  can  be  described  heuristically  as  follows. 
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Let  nj  denote  the  number  of  species  appearing  once  in  the  sample 

Since  A'n+i  is  missing,  we  cannot  observe  n'j,  but  instead  we  can  estimate 
based  on  {A^i,A’2,. . . , Xr}.  Let  n'j  denote  this  estimate  which  is  defined  by 

^  nj  =  Hi  +  1  with  prob.  ^ 

”1  =  with  prob,  (1  —  ^ 

fij  =  rii  —  1  with  prob. 

The  expected  value  of  n'j  given  (nj ,  nj, . . .)  is 

(3.1,4)  E  (nj|(ni,n2,...,)  =  ni  +  nin“’ -  2n2n"*  . 

The  final  estimate  of  estimating  the  total  probability  of  unseen  species  in  the  sample 
{A'l . A'n}is 


(3.1.5) 


E(n'i|(??j,...,))  _  (>?]  -t-  nin~^  -  2n2n~^) 
n  +  1  n  +  1 


The  fact  that  the  bias  of  this  estimate  is  of  order  0(;^)  can  be  seen  by  noting  that 


'  n  n  '  '  n  n  n  +  1  n  +  1 ' 

where  n^  is  the  number  of  species  appearing  twice  among  {A'l ,  A’2 , . . . ,  A'n ,  A'n+ j }  ■ 

It  is  clear  that  jui  —  n'jj  <  1  and  |2n2  —  Sn'jl  <  2  with  probability  one.  It  follows  from 
(3.1.5)  and  (3.1.6)  that  the  absolute  bias  of  (3.1.5)  is  bounded  by  ,  which  is  of  order 

(;^)- 

One  can  mimic  the  above  idea  to  find  an  estimator  which  is  “better”  than  (3.1.2)  in 
estimating  the  total  probability  of  all  species  tht  appear  r(r  >  1)  times  in  the  sample.  The 
improved  estimator  is 


(3.1.7)  ((r  +  l)nr+,  +  n^+i  ((r  +  l)nr+j  -  (r  +  2)nr+2)n~*](n  +  1)  \ 

whicli  has  smaller  bias. 

3.2  Estimating  the  Volume  of  a  Convex  Set  in  SR*'  . 

The  problem  of  estimating  the  volume  of  a  certain  convex  set  can  be  described  as 
follows: 


Let  V  denote  a  certain  unknown  convex  set  with  finite  volume  in  3?*.  The  data  in  this 
problem  consists  of  independent  random  samples  A'l,  A'2,. . .  ,A'n  uniformly  distributed 
over  V.  The  first  question  we  want  to  ask  is:  having  observed  A'j ,  A'2, . . . ,  A„,  how  do  we 
estimate  vol  ( V)? 

To  answer  this  question,  we  first  write  down  the  joint  likelihood  of  A'l , . . . ,  A'n  as 

'  In” 

(3.2.1)  Lik(A',,A2 . = 

where  Vn=l'^n(A‘] ,  A'’2, . . . ,  A'n)  is  the  convex  hull  formed  by  {A'l ,  A'2, . . . ,  A'n},  and  I{A  C 
B)  =  lifj4cB,0  otherwise. 

It  is  easy  to  see  from  (3.2.1)  that  y„,  the  convex  hull  formed  by  {A'l,  A'2, . . . ,  A'n}, 
is  a  sufficient  statistic  of  V,  according  to  Neyman’s  factorization  theorem.  This  suggests 
that  a  reasonable  estimate  of  vol(  V)  should  be  a  function  of  Vn,  tbe  sufficient  statistic  of 
V. 

To  construct  an  estimate  of  vol(  V),  we  first  consider  the  problem  of  estimating  the 
conditional  probability  P(A*„+i  €  l''„|V'„).  As  we  shall  see  below,  this  problem  can  be 
treated  as  a  special  czise  of  our  general  coverage  problem. 

Let  fl  =  V  =  A,  and  let  F  be  the  usual  Borcl  field  on  V.  Let  P  be  the  probability 
measure  uniformly  distributed  over  V,  Define  g{u')  =  u\  the  identity  map  from  V  to  V.  If 
we  define  Sn=S(A’'i,...,A’„;P)=Vn(A'i,A'2,...,A'n),  the  (n  —  l)-estimate  of  P(A'„+i  € 
V„|V;)  is 


(3.2.2) 


I'tiv.-.AXi) , 


where  Vn-ij  is  the  convex  hull  formed  by  (J  {A’i).  Since 

i¥j 


it  follows  that 


6  V, 


;iK.)  =  /( 


vol(F) 


vol(K) 

vol(r)  ’ 


(3.2.3) 


vol(r)  = 


Vol(Vn) 

P(A'„+,  €  K|Vn)  • 
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Substitute  P(A'’n+i  €  VnlFn)  by  (3.2.2),  the  (n  —  l)-estimate  of  vol(  V)  is 


(3.2.4)  =  voKK)  •  [i  . 

J  =  1 

Like  Section  (3.1),  the  estimates  (3.2.2)  and  (3.2.4)  can  be  further  improved.  From 
(3.2.2),  the  (n  -  l)-estimate  of  P(A'’„+,  i  V„l\;.)  is 


(3.2.5) 


^  of  vertices  of  Vn 


n 


Let  vtx  (U)  denote  the  set  of  vertices  of  a  convex  polyhedron  U  in  applying  the 
similar  idea  of  (3.1.4)-(3.1.7)  to  the  current  situation,  we  end  up  with  a  modified  estimate 
(ofP(AVj  V„|V„)) 


(3.2.6) 


#{vtx(Vn))  +  i  El#{vtx(V„)}  -  #{vtx(r„_i,,)}] 

_ _ 

n  +  1 


where  #  {vtx  (U)}=  number  of  vtx(U)  for  a  convex  polyhedron  U.  The  modified  estimates 
of  P(A'„4.i  €  I'^nH'ii)  and  vol(  V)  are  thus 


(3.2.7)  I  -  (#{vtx(r„))  +  i  ^(#{vtx(V.))  -  #{vtx(F„-,,,))|)(n  +  I)"' 

and 


>=1 


(3.2.8)  vol(V„)  •  {1  -  [#{vtx(V„)l  +  i  j^(#{vtx(F„))  -  #{vtx(V„.,,,))ll(n  + 1)-' }-'  , 
respectively. 

It  is  not  difficult  to  check  that  the  “biases"  of  estimates  (3.2. C)  and  (3.2.7)  arc  of 
smaller  order  (O(^),  in  fact)  than  those  of  (n-l)-estimates  provided  by  (3.2.5)  and  (3.2.2). 
Since  the  arguments  to  verify  this  fact  are  very  similar  to  those  given  in  Section  3.1,  we 
omit  it. 
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vol(A„)  +  ^  £  [vol(A„)  -  vol(A„-i,>)] 

_ _ 

n  +  1 


aind  is  the  convex  hull  formed  by  (J  {A';, }. 

Before  we  move  on  to  the  next  application,  let  us  consider  a  simple  example  whicli 
may  add  some  heuristic  feeling  to  what  we  have  done  so  fzu:. 

Example  3.1.  Suppose  A’i,A’'2,...,A'’n  axe  Hi  from  U(0i,^2)»  with  unknown  parameter  d\ 
and  02-  The  “volume”  (length,  in  fact)  of  the  current  convex  set  is  B2  —  6x.  Let  A'd)  < 
be  the  ordered  values  of  {A’i)"^!.  It  follows  from  previous  discussion,  the 
(n-l)-estimate  of  P  (X„+i  €  (A^i),  A(„))|(A''(i),  A'(„))  is 

(3.2.10) 

In  fact,  from  (i)  of  Section  2  this  (n-l)-estimate  is  an  unbiased  estimate  of  P(X„  6 
(A(i), A''(„_i)))  based  on  n-1  obser\'ations  {A\}”J’,’.  The  “better”  estimates  of  P(A'„+i  6 
(A’’(i),A(„))l(A(i),A'’„)))  and  Bj  -  ^1,  are  (from  (3.2.7))  thus 


(3.2.9') 

where 


IG 


(3.2.11) 

and 


1  - 


2 

n  +  1 


n  -  1 
n  +  1 


(3-212) 

respectively. 

It  is  heuristically  clear  that  the  volume  of  V„  would  tend  to  the  volume  of  V  as  n  goes 
to  infinity.  It  is  desired  to  find  the  rate  (and  distribution,  if  possible)  that  how  fast  the 
volume  of  Vn  tends  to  that  of  V  as  n  becomes  large.  As  an  application,  we  shall  show  in 
the  following  that  the  problem  can  be  solved  in  Sft^  via  the  interpretation  together  with  a 
recent  result  of  Groeneboom  (19SS).  Let  he  the  number  of  vertices  of  Vn-  If  V  is  a 
convex  polygon  in  3?^  with  r  edges,  it  was  shown  in  Sfenyi  and  Sulanke  (1963)  that 

EA^n  as  n  — ♦  oo  . 

o 

It  was  also  shown  in  the  same  paper  that  — »  constant  if  V  has  a  smooth  boundary  in 

VI  3 

3?^.  Since  then  much  work  has  been  done  in  this  direction:  Efron  (1965),  Geffroy  (1959, 
1961),  Raynaud  (1970),  Eddy  suid  Gale  (19S1),  Buchta  (1984),  and  Sclmeider  (1987)  among 
others. 

In  his  recent  paper,  Groeneboom  (1988)  obtained  some  interesting  results  which  will 
be  stated  as  a  proposition. 

Proposition  3.1.  (Groeneboom  (1988)) 

(1)  If  r  is  a  convex  polygon  with  r  vertices,  then,  fis  n  — ♦  oo, 

(^n  -  |rlogn)/^^rlogn  -^A^(0,1) 

(2)  If  V  is  the  unit  disk  on  the  plane,  then.  n  -♦  oo, 

(Af„  -  27rC,ni)/v/2jrC2nJ/3  4^(0, 1)  , 

where  Ci.Co  are  two  positive  constants  between  zero  and  one. 

FVom  (3.2.5),  the  (n-l)*estimate  is  aji  unbiased  estimate  of 

P(-Y„  i  yn-i)=l-^^-^,^,  that  is, 
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It  follows  that 


>  -.|vol(V')-E(vol(K,.,))l 

- - 


(3.2.13) 


,  n(vol(F)-E(vol(r„_,))] 

"  ;^i(F) 


_  f  |rlogn  +  o((logn)'*'*/^)  if  V  i 
“  l2:rCini +o(n+‘/®)  if  V  i 


is  a  polygon  with  r  vertices 
is  the  unit  disk. 


i  2nCin»  +  o(n‘’'*/®)  if  V  is  the  unit  disk. 

Combining  (3.2.13),  Proposition  3.1,  and  the  fact  that  is  an  unbiased  estimate  of 
■P(A'n  ^  Pn-i),  've  have  proved  the  following  result. 

Theorem  3.1 

(1)  If  P  is  a  convex  polygon  with  r  (r  >  3)  vertices,  then,  as  n  — ♦  cx). 


(3.2.14) 


n[^-P(A'„  ^  r„M)]/y~rlogn  4a^(0,1) 


(3.2.15)  -P(A'„+,  ^  V„)]/jprlo6n  4a^(0,1) 


(3.2.16)  n[^  voKVn)-  E[vol(V'\V„-,))]/[y~rlogn  •  vol(10] -A^(0, 1) 


(2)  If  V  is  the  unit  disk  in  the  plane,  then,  as  n  oo,  we  have 


P(A'„^V;.,)«0(n^) 


(3.2.17) 


„5/«  _  p^x„  i  V„-,)]/\/2^J^  -^N(0,1) 
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(3.2.18)  volCV;)  -  E(vol(r\V„)]}/v/2^  vol(V)-^NiO,  1)  . 

Note  that  (3.2.16)  follows  from  the  fact  that 


E(vol(V\F„-i))  =  vo1(K)P(A'„  i  K._,) 

and 

^lvol(l'„)-vol(V)l  =  o,(l), 

since 


n  '  rt  vol(r)  n 


vol(F„)-vol(F)  =  Op(i^). 


Remark.  In  the  case  that  V  is  a  general  convex  set  with  smooth  boundary,  the  results  in 
(2)  still  hold,  but  with  Cj  replaced  by 


C'2  =  C2(7r/vol(F))’^^  /  kisy'^ds/2n  , 

JdV 

where  dV  is  the  boundary  of  V,  k{s)  is  the  curvature  function  of  arc  length.  For  detail, 
see  Sfenyi  and  Sulanke  (1963),  and  Groeneboom  (198S). 

Some  implications  deserve  further  discussion  here.  From  (3.2.5),  the  probability  of 
new  observation  will  fall  outside  the  convex  hull  formed  by  the  sample  { A'l , . . . ,  A'n  } 
is  determined  by  the  knowledge  about  the  number  of  vertices  of  the  conves  hull.  This 
result  (i.e.,  (3.2.5))  holds  for  any  distribution  on  and  any  Jt  >  1.  However,  to  estimate 
the  volume  of  a  convex  body,  the  uniform  distribution  is  used  to  create  the  relation  like 

(3.2.3) . 

We  don’t  have  a  general  theorem  like  Theorem  3.1  in  when  I:  >  3  simply  because 
a  more  general  version  of  Proposition  3.1  is  not  available  at  the  moment.  However,  from 
an  applied  point  of  view’,  we  can  always  estimate  the  volume  of  a  convex  figure  by  Formula 

(3.2.4) ,  and  the  vertices  of  Vi,  will  provide  us  with  information  about  V  \  Fn-  It  seems  to 
this  author  that  almost  all  relevant  information  about  V'  \  is  within  the  set  of  vertices 
of  Vn.  This  point  will  be  further  justified  in  Section  4  in  terms  of  species  problem. 
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The  following  problem  is  of  interest: 

Let  V  be  a  smooth  convex  figure.  We  know  that  ^  0(n~’ )  in  and  0(n“  ^ )  in 

8?^  (from  (2)  of  Theorem  3.1).  What  are  the  ratios  in  3?*^  when  k  >  3.  A  less  ambitious 
problem  is  to  find  the  increasing  rate  of  these  ratios  {ri,r2, . . .},  where  rt  stands  for 
the  ratio  in  R*'. 

3.3  The  Missile  Problems 

n  missiles  are  delivered  and  landing  at  a  certain  target  area  which  is  usu2dly  much 
larger  than  the  “effective  area”  caused  by  the  explosion  of  a  single  missile.  The  “effective 
area”  here  can  be  referred  to  as  a  “covered  area”  in  the  present  terminology.  The  problems 
we  are  interested  in  are:  (1)  if  the  n  +  1**'  missile  is  fired,  what  is  the  chance  that  this 
additional  missile  would  involve  area  which  was  not  covered  previously?  (2)  How  large  is 
the  newly  covered  area?  (3)  How  manj'  more  missiles  need  to  be  fired  in  order  to  cover 
90%  of  the  target  area? 

To  answer  these  types  of  questions,  we  introduce  a  simple  model  which  seems  to  reflect 
the  real  situation  reasonably  close. 

Let  A  denote  the  target  area  where  the  missiles  would  Wl.  Assuming  that  the  locations 
of  landing  for  all  missiles  are  independent  of  each  other  and  follow  a  certain  unknown 
distribution  G  over  A,  let  denote  these  n  landing  points.  For  each  landing 

point  y,  there  is  a  covered  area  H(yi,r,)  associated  with  1^,  where  H(y,, r,)  denotes  the 
intersection  of  A  and  the  disk  with  center  I'i  and  random  radius  r^.  Note  that  each  r, 
may  depend  upon  li,  but  r,  and  rj  are  independent  for  different  ij  since  Yi  and  V)  are 
independent.  If  we  let  A',-  =  B{Yi,r,)  and  g{Xi)  =  Y,  for  all  1  <  t  <  n,  it  is  clear  that 
the  current  model  is  within  the  framework  of  our  general  coverage  problem  described  in 
Section  2. 

The  chance  that  the  (n  +  1)^‘'  missile  would  land  at  “uncovered  area”  can  be  written 
as 

n 

(3.3.1)  P(<?(A'„+i)  i  S„\Sn) , where  S„  =  5„(A'‘i,...A'n;i’)  =  * 

•  si 

FVom  Section  2,  the  (n-l)-cstimate  is 


(3.3.2) 


#of  {Yi-,Yii  U{A'.}} 

_ ifi _ 

n 
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where 


•Vj 

Let  us  define  ni(Sn)=#  of  {Yj\Yj  ^  for  brevity,  and  the  (n-l)-estiniate  in 

(3.3.2)  can  thus  be  written  as  Applying  the  similar  idea  of  (3.1.4)-(3.1.7)  to  the 

current  caise,  we  come  up  with  a  “better  estimate” 


(3.3.2') 


ni(5„)+i  £(ni(S„)-n,(5„_i.,)) 

_ ^ _ 

n  +  1 


To  estimate  the  size  of  newly  covered  area  b}'  the  (??  +  I)***  missile,  it  is  easy  to  deduce 
from  (ii)  in  Section  2  that  the  (n-l)-estiniate  is 


(3.3.3) 

where 


i  ^  vol[A',\5„_,.>)  =  (say) 

Tl  \  “  71 


vi(S..)  =  vollA*AS„-,J  . 
Similarly,  one  can  deduce  a  “better  estimate”  which  is 


(3.3.3') 

where 


Vl(<5n)+^  E(t^l(5„)  -  t>,(5„_J,;)) 

_ _ 

n  + 1 


and 


Vi(5„-i.j)  =  51  for  1  <  j  <  n  , 


Sn-7,ii  —  U  {Afc} 
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4.  Some  limit  theorems  in  species  problem 

In  this  section  we  shall  present  some  large  saimple  results  for  the  various  e'.timators 
derived  from  our  interpretation  in  the  species  problem.  The  material  of  this  section  is 
somewhat  technical.  The  idea  used  and  the  results  obtained  in  this  section  are  not  limited 
to  the  species  problem  alone.  With  additional  effort,  it  is  expected  to  extend  the  idea  to  a 
more  general  situation  which  may  cover  all  cases  discusses  in  Section  3.  However,  in  order 
to  present  the  results  simply  and  clearly  we  shall  focus  on  the  species  problem. 

Recall  from  Section  3.1  that  {.Y^  =  i)  <=>  the  j***  trial  results  on  outcome  a  € 
{ci,e2,.  •  •}=  outcome  space.  If  for  each  outcome  e,  there  is  a  real  value  y,  (or  a  real  vector 
yi)  associated  with  it,  then  we  may  ask  the  question:  “Can  one  estimate  the  parameter 
associated  with  the  unobserved  species?”  The  general  solution  to  this  question  will  become 
apparent  after  we  consider  the  following  two  simple  examples. 

Let  Yj  =  y,  if  Xj  =  t.  The  observed  data  are  thus  {(-Yj,lj),  1  <  j  <  n).  The  outcome 
space  is  {(ci,y,)}. 

Example  4.1.  The  mean. 

In  this  case  we  are  interested  in  the  conditional  mean  of  unobserved  outcomes  given 

((-v.,r.)},’u,.  i.=., 

(4.1)  JydP(y\Y„)  ,  where  Y„  =  (l'„5j . V,)  , 

(4.2)  P(E1Y„)=  ^  PjV>(0;n)/5ZP;'f’>(0;n) 

v,eE  >=i 

and  E  is  any  Borel  set  in  R  (or  in  3?*'  if  y  is  a  vector  in  3i*“  ).  The  conditional  distribution 
of  P(E|Yn)  can  thus  be  written  as 

P(y|Y„)  =  P((-oo,y]lY„) 

if  {yy}  are  real-valued. 

To  estimate  (4.1),  we  appeal  to  the  interpretation.  It  is  clear  (from  the  interpretation) 
that  the  (n-l)-estimate8  of 

_  OO 

52  P>S5'>(0; n)y,  and  52p;Vj(0;”) 


are 


^  T,  ■rs.-,./(AVPv 

»i€E 

and 


i-1 


respectively.  Recall  that  Sn-ij—  U  {^i)=A„ij.  A  natural  (n-1)- estimate  of  P(E1Y„)  is 


(4.3) 


P(E|Y,)  =[i  €  E)]/(^) 

n 


The  final  (n-l)-estimate  of  conditional  mean  (4.1)  is  thus 


j=J 

This  simply  tells  us  that,  to  estimate  the  conditional  mean  of  unseen  species  one  should 
use  sample  mean  of  the  corresponding  obser\'ations  which  occur  only  once  in  the  sample. 
Example  4.2.  The  median. 

In  this  case  we  are  interested  in  the  median  of  {yj\j  ^  {A'l,. . . ,  A'n) }.  From  the 
interpretation  again,  it  is  easy  to  clieck  that  the  (n-l)-estimate  is  simply  the  sample  median 
of  Yi  of  which  the  corresponding  Xi  occurs  only  once  in  the  saunple. 

FVom  these  two  examples  it  is  not  difficult  to  answer  a  more  general  question.  If  we  are 
interested  in  a  parameter  ^=^(P(-|Y„)),  w'hidi  is  a  smooth  function  of  P(-|Y„)  as  defined 
in  (4.2),  the  naive  (i»-l)-estimate  is  thus  d=5(P(-lY„)).  Just  how  well  is  ^  as  an  estimate 
of  0?  The  success  of  estimating  5(P('IY„))  by  0(P(-|Y„))  depends  upon  the  magnitude  of 

OD 

52  •P>^>(0;n),  the  total  unobserved  probability,  whicli  is  estimated  by  The  following 
propositions  piovide  some  theoretical  justification  of  this  estimate. 

Ejappsjtjpn  41  •  Assuming  that 
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Ey’  <oo,n-i(^P.(l- P, =“(1)  . 


y  vJP(v|Y,) 

stays  bounded  in  probability,  then 

I  J  ydPMY.)-  JydP(y\Y„)\^0 

in  T^robability  as  n  — ►  oo. 

Let 


OO 

■f'n(y)  =  -F'(y|Y„)  =  ^  PjVj(0;'0/53p;V>(0;n) . 

j-y 

The  estimate  Fn(i/)  =  /'(y|Y„)  can  be  written  as 


(4.5) 

where 


-ft.  ■  J(V,  <y)/-flr.„  =  ^-  ^ 

n  n  '  jji 

1=1  IS)  ‘ 


and 


f  1  if  i  appears  exactly  once  in  {.Vj ,  A'2 , . . . ,  A‘n } 
( 0  otherwise 


>1,  ={A'.;«'A„n  =  l}. 

The  following  proposition  shows  that  as  an  estimate  of  Fn{^  ^n(y)  is  uniformly 
consistent. 

Proposition  4.2.  Assuming  that 


OO 

-p.)”'*)  =0(1), 

isj 

then 
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(1) 


[Ep'^(0;")  -  Ep'(i  -P‘)”]/E^’'(i  -P')"  = 


Proof  of  (!'>■  It  sufHccs  to  show 


E  It -Emi -;>.)"!’ =  0(i) . 


To  see  this,  it  is  easy  to  check  tliat  under  the  assumption,  the  LHS  of  (4.6)  is  bounded  by 


53p.Pj(1  ~Pi  -  5^P.P>(1  -P.)"(l  -Pj)"  +o(^) 


-  X^P'Pi(^  "  P'  ■  PiT'Hpi  +  P>)  +  o(^) 

^2Ep?E'v(1 +“(^) 

•  JV' 

<2^p?(l  -Pi)"~*  +0{-)  =  O(-)  (by  Lemma  1)  . 


This  completes  the  proof  of  (1).  Since  the  proof  of  (2)  is  quite  similar,  we  omit  it. 
Lemma  3.  Under  the  assumptions  n“*(53pi(l  —  pi)")~*=o(l)  »nd  EV^  <  oo,  we  have 

E  [i  E/s..,.,(A'i)(r,)  -  £p,W0;n)y.]'  =  0{i) 


Proof.  It  is  easy  to  see 


can  be  written  as 


Now, 


”  i  • 

+n(n  —  1)  ^PiPj(l  —  P»  ~  P>)” 
i¥i 

=-  v,.,(i  -  ,>.r-'i/?  +  ^  Ep'wf'  -  >’■  -  w)""’*'-!'' 


2 

E(^p,vt(0;”)yt)  -p*Ty^i 

i  • 

+  ^PiP>(l  -  Pi  -  PiTViVi  ’ 

it  follows  that 


(4.8) 


I 


=  Tpiii  -  Pir~'yl  +  (”  - 

i 

-2^np,pj(l  -  Pi  -  PjT~' 

•’">  1 
+n2Pi(l  -Pi)"Vi  +»»XJPiP>(l  -P*  Lemma  1') 

i 

=  Vpi(i  -  Pi)""Mi  +  "Pi(i  “  P*  )!’•'?  ”  5ZP'Pi(i "  P' "  p>)"~’p*p>  ■•■ 

i  •’'> 

(This  follows  from  Lemma  1,  EV’’  <  oo,  and  similar  argument  in  (4.7).) 
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/Note  that  n^p, pj(l  — Pi  — P>)"  HPi  ~P>)lyi!/>P 
i^j 

<n  ^P?(l  -  p.)”"'  ly.l  [l^P;ly;l] 
i  j 

<n  E  IKl  •  O(^)  (by  Lemma  1') 

V  =0(1)  ) 

It  follows  that  (4.8)  is 

<  ^PuVl +  0(1)  <  oo  . 
i 

This  completes  the  proof  of  the  lemma. 

Proof  of  proposition  4.1 
Rewrite 

J  ydPiy\Y„)-  j  ydPiylYr.) 

as 


(4.9) 


„  i>u  +  . 

X/m  —  ■  } 

Op  ^11 


where 

a„  =  J3p.<Pi(0;n),  b„  =  ^p,V.(o;«)y. 

i  • 

Sn  =-  /  'i'i.nyi  “  £n  =  ~  • 

n "  ” 

I 

Dn  can  be  further  written  as 


(4.10) 


On^n  bnCn 

(a„  +£„)<!„ 


By  Lemma  2,  n”i(^)"*s=0;,(l). 
Lemma  2,  it  follows  that 


Sfi  bnCn 

On+Cn  (On+Cn)®Ti 

Since  6„=o,,(^)  by  Lemma  3,  and  £„=0p(Q„)  by 
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(4.11) 


"t" 


=0.(1), 


(4.12) 


- — — — -  .  —  =  Op(l)  .(since  b„a~^  —  0^(1)  by  assumption) 

(Ofj  "t"  £n  j  Ofi 


The  proposition  follows  immediately  from  (4.11)  and  (4.12). 
Proof  of  proposition  4.2 
It  is  easy  to  check  that 


(4.13) 


E  (;  E s !/))  =  -  ?■)”■' Ay.  <  y) 


(4.14)  E  ( p,v.(Oi «))  =  ^p.(i  - PiTHy*  S  y)  • 

Vi<y  » 

From  Lemma  1,  it  is  easy  to  see 

^  y)-  X!pi(1  -pi)’'I{y,  <  y)|  =  O(^)  . 

I  i 

Furthermore,  with  a  similar  argument  as  in  Lemma  3,  one  can  show  that 

(4.15)  ”  ^  H  ^i,n/(y.  <y)-  ^PiV.(0;  n)I{y,  <  y)]  <  M  <  oo 

t  t 

for  some  positive  M,  independent  of  y.  Proposition  4.2  is  an  immediate  consequence  of 
(4.12). 


Acknowledgement.  I  wish  to  express  my  sincere  thanks  to  Professor  Herman  Chernoff.  The 
conversations  I  have  had  with  him  during  this  period  of  investigation  were  most  valuable  to 
me.  I  also  wish  to  thank  Professors  Frederick  Hosteller,  Arthur  Dempster,  Donald  Rubin, 
Persi  Diaconis,  and  Arthur  Cohen  for  the  constructive  comments  they  made  during  this 
study.  Thanks  also  to  Dr.  B.  H.  Juang,  from  Bell  Labs,  who  mentioned  the  language 
model  to  me  and  sent  several  useful  references. 


29 


REFERENCES 


Bahl,  L.R.,  Jelinek,  F.,  and  Mercer,  R.C.  (1083)  “Maximum  likelihood  approach  to  contin¬ 
uous  speech  recognition.”  IEEE  TVans.  Pattern  Analf/sis  and  Machine  Intelligence 
5,  No.  2, 170-100. 

Bickel,  P.J.  and  Yahav,  J.A.  (1986)  “On  estimating  the  total  probability  of  the  unobserved 
outcomes  of  an  experiment.”  Adaptive  Statistical  Procedures  and  Related  Topics. 
IMS  Lecture  Notes.  Edited  by  Van  Ryzin.  332-337. 

Buchta,  C.  (1984)  “Stochast'ische  approximation  konvexer  polygons.”  Z.  Wahrscheinlich- 
keitstheor.  Verw.  Geb  67,  283-304. 

Clayton,  M.  and  Frees,  E.  (1987)  “Nonparametric  estimation  of  the  probability  of  discov¬ 
ering  a  new  species.”  JASA  82,  305-311. 

Cohen,  A.  and  Sackrowitz,  H.  (1987)  “On  estimating  the  probability  of  unobserved  out¬ 
comes.”  Preprint. 

Dempster,  A.,  Laird,  N.,  and  Rubin,  D.  (1977)  “Maximum  likelihood  from  incomplete  data 
vai  the  EM  algorithm.”  (with  discussion)  JRSS,  B  39,  No.  1,  1-38. 

Diaconis,  P.  and  Stein,  C.  (1983)  “Decision  theory,”  Lecture  notes,  Stanford  University. 

Eddy,  W.F.  and  Gale,  J.D.  (1981)  “The  convex  hull  of  a  spherically  symmetric  sample,” 
Adv.  Appl.  Prob.  13,  751-763. 

Efron,  B.  (1965)  “The  convex  hull  of  a  random  set  of  points.”  Biometrika  52,  331-343. 

Efron,  B.  and  Thisted,  R.  (1976)  “Estimating  the  number  of  unseen  species:  how  many 
words  did  Shakespeare  know?"  Biometrika  63,  No.  3,  435-447. 

Estey,  W.E.  (1986)  “The  efficiency  of  Good’s  nonparametric  coverage  estimator.”  Annals 
of  Statistics  14,  1257-1260. 

Geffroy,  J.  (1959)  “Contribution  k  la  th^orie  des  valeurs  extremes."  Publ.  Inst.  Stat.  Univ. 
Paris  VIII,  123-185. 

GefTroy,  J.  (1961)  “Localisation  asymptotique  du  polykdre  d’appui  dun  ^chantillon  Lapla- 
cien  k  k  dimensions.”  Publ.  Inst,  Stat,  Univ.  Paris  X,  212-228. 

Good,  1.3.  (1953)  “The  population  frequencies  of  species  and  the  estimation  of  population 
parameters.”  Biometrika  40,  237-264. 

Good,  I.J.  and  Toalmin,  G.  (1956)  “The  number  of  new  species,  and  the  increase  in 
population  coverage,  when  a  sample  is  increased.”  Biometrika  43,  45-63. 

Groeneboom,  P.  (1988)  “Limit  theorems  for  convex  hulb.”  Probability  Theory  and  Related 
Fields  79,  329-368. 

Jelinek,  F.  (1976)  “Continuous  recognition  by  statistical  method.”  IEEE  Proceedings  64, 
No.  4. 

Katz,  S.M.  (1987)  “Estimation  of  probability  from  sparse  data  for  the  language  model  com¬ 
ponent  of  speech  recognizer.”  IEEE  Trans.  Acoustics  Speech  and  Stgnal  Processing, 
400-401. 

Raynaud,  H.  (1970)  “Sur  I’enveloppe  eonvexe  des  nuages  de  points  alkatoires  dans  R" ."  J. 
Appl.  Probab.  7,  35-48. 


30 


JR^nyi,  A.  and  Sulanke,  R.  (1963)  “Uber  die  konevxe  Hulle  von  n  zufallig  gewahlten  punk- 
ten.”  Z.W.  Vtrw.  Geb.  2,  7S-84. 

Robbins,  H.  (1956)  “An  empirical  Bayes  approach  to  statistics.”  Proc.  Srd  Berkeley  Symp. 
1,  137-163. 

Robbins,  H.  (1968)  “Estimating  the  total  probability  of  the  unobserved  outcomes  of  an 
experiment.”  Ann.  Statist.  80, 25&-257. 

Robbins,  H.  (1977)  “Prediction  and  estimation  for  the  compound  Poisson  distribution.” 
Proe.  Nat.  Acad.  Sei.  USA  74,  2670-2671. 

Schneider,  R.  (1987)  “Random  approximation  of  convex  sets.”  Preprint.  Mathematical 
Institute,  Albertludwigs  Univ.,  FRG. 

Starr,  N.  (1979)  “Linear  estimation  of  the  probability  of  discovering  a  new  species.”  Ann. 
Statist.  1,644-652. 


31 


